NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Foundation model for mass spectrometry proteomics

Sanders, Justin; Yilmaz, Melih; Russell, Jacob_H; Bittremieux, Wout; Fondrie, William_E; Riley, Nicholas_M; Oh, Sewoong; Noble, William_Stafford (May 2025, https://doi.org/10.48550/arXiv.2505.10848)

Mass spectrometry is the dominant technology in the field of proteomics, enabling high-throughput analysis of the protein content of complex biological samples. Due to the complexity of the instrumentation and resulting data, sophisticated computational methods are required for the processing and interpretation of acquired mass spectra. Machine learning has shown great promise to improve the analysis of mass spectrometry data, with numerous purpose-built methods for improving specific steps in the data acquisition and analysis pipeline reaching widespread adoption. Here, we propose unifying various spectrum prediction tasks under a single foundation model for mass spectra. To this end, we pre-train a spectrum encoder using de novo sequencing as a pre-training task. We then show that using these pre-trained spectrum representations improves our performance on the four downstream tasks of spectrum quality prediction, chimericity prediction, phosphorylation prediction, and glycosylation status prediction. Finally, we perform multi-task fine-tuning and find that this approach improves the performance on each task individually. Overall, our work demonstrates that a foundation model for tandem mass spectrometry proteomics trained on de novo sequencing learns generalizable representations of spectra, improves performance on downstream tasks where training data is limited, and can ultimately enhance data acquisition and analysis in proteomics experiments.
more » « less
Free, publicly-accessible full text available May 19, 2026
Accounting for Digestion Enzyme Bias in Casanovo

https://doi.org/10.1021/acs.jproteome.4c00422

Melendez, Carlo; Sanders, Justin; Yilmaz, Melih; Bittremieux, Wout; Fondrie, William E; Oh, Sewoong; Noble, William Stafford (October 2024, Journal of Proteome Research)

Full Text Available
Deep Learning Methods for De Novo Peptide Sequencing

https://doi.org/10.1002/mas.21919

Bittremieux, Wout; Ananth, Varun; Fondrie, William_E; Melendez, Carlo; Pominova, Marina; Sanders, Justin; Wen, Bo; Yilmaz, Melih; Noble, William_S (November 2024, Mass Spectrometry Reviews)

ABSTRACT Protein tandem mass spectrometry data are most often interpreted by matching observed mass spectra to a protein database derived from the reference genome of the sample being analyzed. In many application domains, however, a relevant protein database is unavailable or incomplete, and in such settings de novo sequencing is required. Since the introduction of the DeepNovo algorithm in 2017, the field of de novo sequencing has been dominated by deep learning methods, which use large amounts of labeled mass spectrometry data to train multi‐layer neural networks to translate from observed mass spectra to corresponding peptide sequences. Here, we describe these deep learning methods, outline procedures for evaluating their performance, and discuss the challenges in the field, both in terms of methods development and evaluation protocols.
more » « less
Evaluation of a Wastewater-Based Epidemiological Approach to Estimate the Prevalence of SARS-CoV-2 Infections and the Detection of Viral Variants in Disparate Oregon Communities at City and Neighborhood Scales

https://doi.org/10.1289/EHP10289

Layton, Blythe A.; Kaya, Devrim; Kelly, Christine; Williamson, Kenneth J.; Alegre, Dana; Bachhuber, Silke M.; Banwarth, Peter G.; Bethel, Jeffrey W.; Carter, Katherine; Dalziel, Benjamin D.; et al (June 2022, Environmental Health Perspectives)

Full Text Available

Search for: All records